Version info:

Imports

Read the whole dataset and reduce it to what we are interested in

Reduce to years we are interested in

Define variable only_granted. Only used for creation of plot filenames

In person_ctry_code: Replace NaNs with ' '

Dataset integrity check

Infer our time frame from data

Assess missing values situation

Check the percentage of missing values in person_ctry_code

Get the application authority distribution for rows with missing person_ctry_code

Share of applications with missing person_ctry_code that are filed in China

Share of applications with missing person_ctry_code that are filed in Japan

Share of applications with missing person_ctry_code that are filed in the rest of the world

Check how many countries are involved in this analysis

Define the core function of this notebook

Separation of patent families with applicants from only one country vs. international cooperations

Whole time period

Split in two time periods

Get the dataframes

Plot national/international counts

Grouped bar plot comparing percentages from first and second time periods

Count patents for the whole time period

Prepare plot for countries' total over the whole timespan

Get all countries involved in this analysis sorted by their IPF counts

Continue preparing the plot

Count patents for each year

Yearly totals plot

Get IPF increase from year 2000 to year 2019

Check what separation in two time slices makes most sense in terms of IPF counts

Compute mean increase year-over-year

Yearly, by country plot

Relative to year 2000 plot (for instance, for South Korea: divide each year's value by South Korea's first year's value)

China's share plot

Herfindahl–Hirschman index plot

https://en.wikipedia.org/wiki/Herfindahl%E2%80%93Hirschman_Index

Per labor force plot

Create a labor force dict

Side research: Compute mean increase year-over-year for China

Per labor force plot with logarithmic scale

Per labor force plot - a close-up that excludes South Korea and Japan

Continents plot

Get each continent's population share in 2019 and on average (2000-2019)

Continents per 1M workers plot

Per labor force plot, sorted by values after scaling

For comparison: The plot from above that simply shows the top 8 countries in terms of total patent family output

Relative to year 2000 plot (for instance, for South Korea: divide each year's value by South Korea's first year's value)

Plot that shows the top 6 from both (sorted by total and sorted by values after scaling

Plots that separate different battery technologies

Create technologies dataframes

Global plot

Countries' plots

Clustering countries using their technology distributions

Define functions for assessment, clustering, and plotting

2010-2019

Find 2 clusters

Find 3 clusters

Run k-means n times in order to compute affiliations distributions
Get countries that are always together
Compute each country's cluster affiliation distribution and build most probable clusters

Create LaTeX code

Find 4 clusters

Find 5 clusters

Find 6 clusters

2000-2009 and 2010-2019

2000-2009

Find 2 clusters
Find 3 clusters
Find 5 clusters

2010-2019

Find 2 clusters
Find 3 clusters
Find 4 clusters

Collect all final plots in one place

Not in the paper but good for getting an overview of all technologies' developments